{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# RandomHoldoutSplit"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Randomly split a dataset into a train and validation subset for validation."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> `from mlxtend.evaluate import RandomHoldoutSplit`    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Overview"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `RandomHoldoutSplit` class serves as an alternative to scikit-learn's `KFold` class, where the `RandomHoldoutSplit` class splits a dataset into training and a validation subsets without rotation. The `RandomHoldoutSplit` can be used as argument for `cv` parameters in scikit-learn's `GridSearchCV` etc.\n",
    "\n",
    "The term \"random\" in `RandomHoldoutSplit` comes from the fact that the split is specified by the `random_seed` rather than specifying the training and validation set indices manually as in the `PredefinedHoldoutSplit` class in mlxtend."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example 1 -- Iterating Over a RandomHoldoutSplit"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1\n"
     ]
    }
   ],
   "source": [
    "from mlxtend.evaluate import RandomHoldoutSplit\n",
    "from mlxtend.data import iris_data\n",
    "\n",
    "X, y = iris_data()\n",
    "h_iter = RandomHoldoutSplit(valid_size=0.3, random_seed=123)\n",
    "\n",
    "cnt = 0\n",
    "for train_ind, valid_ind in h_iter.split(X, y):\n",
    "    cnt += 1\n",
    "    print(cnt)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[ 60  16  88 130   6]\n",
      "[ 72 125  80  86 117]\n"
     ]
    }
   ],
   "source": [
    "print(train_ind[:5])\n",
    "print(valid_ind[:5])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example 2 -- RandomHoldoutSplit in GridSearch"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[mean: 0.95556, std: 0.00000, params: {'n_neighbors': 1}, mean: 0.95556, std: 0.00000, params: {'n_neighbors': 2}, mean: 0.95556, std: 0.00000, params: {'n_neighbors': 3}, mean: 0.95556, std: 0.00000, params: {'n_neighbors': 4}, mean: 0.95556, std: 0.00000, params: {'n_neighbors': 5}]\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/sebastian/miniconda3/lib/python3.6/site-packages/sklearn/model_selection/_search.py:762: DeprecationWarning: The grid_scores_ attribute was deprecated in version 0.18 in favor of the more elaborate cv_results_ attribute. The grid_scores_ attribute will not be available from 0.20\n",
      "  DeprecationWarning)\n"
     ]
    }
   ],
   "source": [
    "from sklearn.model_selection import GridSearchCV\n",
    "from sklearn.neighbors import KNeighborsClassifier\n",
    "from mlxtend.evaluate import RandomHoldoutSplit\n",
    "from mlxtend.data import iris_data\n",
    "\n",
    "X, y = iris_data()\n",
    "\n",
    "params = {'n_neighbors': [1, 2, 3, 4, 5]}\n",
    "\n",
    "grid = GridSearchCV(KNeighborsClassifier(),\n",
    "                    param_grid=params,\n",
    "                    cv=RandomHoldoutSplit(valid_size=0.3, random_seed=123))\n",
    "\n",
    "grid.fit(X, y)\n",
    "\n",
    "assert grid.n_splits_ == 1\n",
    "print(grid.grid_scores_)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## API"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "## RandomHoldoutSplit\n",
      "\n",
      "*RandomHoldoutSplit(valid_size=0.5, random_seed=None, stratify=False)*\n",
      "\n",
      "Train/Validation set splitter for sklearn's GridSearchCV etc.\n",
      "\n",
      "Provides train/validation set indices to split a dataset\n",
      "into train/validation sets using random indices.\n",
      "\n",
      "**Parameters**\n",
      "\n",
      "- `valid_size` : float (default: 0.5)\n",
      "\n",
      "    Proportion of examples that being assigned as\n",
      "    validation examples. 1-`valid_size` will then automatically\n",
      "    be assigned as training set examples.\n",
      "\n",
      "- `random_seed` : int (default: None)\n",
      "\n",
      "    The random seed for splitting the data\n",
      "    into training and validation set partitions.\n",
      "\n",
      "- `stratify` : bool (default: False)\n",
      "\n",
      "    True or False, whether to perform a stratified\n",
      "    split or not\n",
      "\n",
      "### Methods\n",
      "\n",
      "<hr>\n",
      "\n",
      "*get_n_splits(X=None, y=None, groups=None)*\n",
      "\n",
      "Returns the number of splitting iterations in the cross-validator\n",
      "\n",
      "**Parameters**\n",
      "\n",
      "- `X` : object\n",
      "\n",
      "    Always ignored, exists for compatibility.\n",
      "\n",
      "\n",
      "- `y` : object\n",
      "\n",
      "    Always ignored, exists for compatibility.\n",
      "\n",
      "\n",
      "- `groups` : object\n",
      "\n",
      "    Always ignored, exists for compatibility.\n",
      "\n",
      "**Returns**\n",
      "\n",
      "- `n_splits` : 1\n",
      "\n",
      "    Returns the number of splitting iterations in the cross-validator.\n",
      "    Always returns 1.\n",
      "\n",
      "<hr>\n",
      "\n",
      "*split(X, y, groups=None)*\n",
      "\n",
      "Generate indices to split data into training and test set.\n",
      "\n",
      "**Parameters**\n",
      "\n",
      "- `X` : array-like, shape (num_examples, num_features)\n",
      "\n",
      "    Training data, where num_examples is the number of\n",
      "    training examples and num_features is the number of features.\n",
      "\n",
      "\n",
      "- `y` : array-like, shape (num_examples,)\n",
      "\n",
      "    The target variable for supervised learning problems.\n",
      "    Stratification is done based on the y labels.\n",
      "\n",
      "\n",
      "- `groups` : object\n",
      "\n",
      "    Always ignored, exists for compatibility.\n",
      "\n",
      "**Yields**\n",
      "\n",
      "- `train_index` : ndarray\n",
      "\n",
      "    The training set indices for that split.\n",
      "\n",
      "\n",
      "- `valid_index` : ndarray\n",
      "\n",
      "    The validation set indices for that split.\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "with open('../../api_modules/mlxtend.evaluate/RandomHoldoutSplit.md', 'r') as f:\n",
    "    s = f.read() \n",
    "print(s)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.6"
  },
  "toc": {
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
